Batch Load

Introduction

GigaSpaces now has the ability for Smart DIHClosed Smart DIH allows enterprises to develop and deploy digital services in an agile manner, without disturbing core business applications. This is achieved by creating an event-driven, highly performing, efficient and available replica of the data from multiple systems and applications, to define batch loads via a standard pipeline interface. Batch load can now be performed without the use of IIDRClosed IBM Infosphere Data Replication. This is a solution to efficiently capture and replicate data, and changes made to the data in real-time from various data sources, including mainframes, and streams them to target systems. For example, used to move data from databases to the In-Memory Data Grid. It is used for Continuous Data Capture (CDC) to keep data synchronized across environments..

Configuring Batch Load: Helm

Enabling

Batch load is enabled through KubernetesClosed An open-source container orchestration system for automating software deployment, scaling, and management of containerized applications. orchestrationClosed This is the automated configuration, management, and coordination of computer systems, applications, and services. Orchestration strings together multiple tasks in order to execute and easily manage a larger workflow or process. These processes can consist of multiple complex tasks that are automated and can involve multiple systems. Kubernetes, used by GigaSpaces, is a popular open source platform for container orchestration.. It is not enabled by default.

The following flag has to be added to the helm command: global.batchload.enabled=true.

Adding the Agent

For each data source created, a separate Batch Load agent must be installed.  GigaSpaces also have a separate helm chart in order to install a batch load agent outside of the umbrella.  This would be used for the case where a client requires more than one agent.  For example, if there are multiple Oracle databases.

To install an agent under the DIHClosed Digital Integration Hub. An application architecture that decouples digital applications from the systems of record, and aggregates operational data into a low-latency data fabric. umbrella: global.batchload-agent.enabled=true

For installing an agent and controlling its name: global.batchload-agent.agent.name=[name of agent].
It is also possible to install the batch load agent outside of the helm umbrella. This would be used in the case of a client needing more than one agent (for example, for multiple Oracle databases): helm install di-agent [dih repo name]/di-agents --version 2.0.0 --set agent.name=[name of agent]

Supported Data Source and Loading Types.

Currently, GigaSpaces supports the ability to perform full batch load from an Oracle DB.  More data sources and loading types will be added in future releases.

Creating a Data Source for Batch Load

Batch Load cannot be configured for a pipeline that is configured and running with CDCClosed Change Data Capture. A technology that identifies and captures changes made to data in a database, enabling real-time data integration and synchronization between systems. Primarily used for data that is frequently updated, such as user transactions. (IIDR).  To enable Batch Load the appropriate configuration must be used when creating the Data Source.

To use Batch load when creating a Pipeline, add a new Pipeline by following steps as outlined in the User Guide: SpaceDeck - Spaces - Adding a Pipeline for Batch Load

User Flows: Creating a Pipeline using Batch Load

Batch Load cannot be configured for a pipeline that is configured and running with CDC (IIDR).  To enable Batch Load a new pipeline has to be created.

Oracle Database: Define Basic Full Batch Load Pipeline

  1. Login to SpaceDeck

  2. Define Oracle as the Data Source with the connector type = BATCHLOAD

  3. Create new pipeline.

Full batch load ends after the full load is completed. The status should be Completed. This differs from a CDC pipeline.